Optimized Parallel Prefix Sum Algorithm on Optoelectronic Biswapped-Torus Architecture
نویسندگان
چکیده
منابع مشابه
Architecture Description and Prototype Demonstration of Optoelectronic Parallel-Matching Architecture
We propose an optoelectronic parallel-matching architecture (PMA) that provides powerful processing capability for distributed algorithms comparing with traditional parallel computing architectures. The PMA is composed of a parallel-matching (PM) module and multiple processing elements (PE's). The PM module is implemented by a large-fan-out free-space optical interconnection and parallel-matchi...
متن کاملImproved Parallel Prefix Algorithm on OTIS-Mesh of Trees
A parallel algorithm for prefix computation reported recently on interconnection network called OTIS-Mesh Of Trees[4]. Using n4 processors, algorithm shown to run in 13log n + O(1) electronic moves and 2 optical moves for n4 data points. In this paper we present new and improved parallel algorithm for prefix on OTIS-Mesh of Trees. The algorithm requires 10log n + O(1) electronic steps + 1 optic...
متن کاملA Parallel Matrix Inversion Algorithm on Torus with Adaptive Pivoting
This paper presents a parallel algorithm for matrix inversion on a torus interconnected MIMD-MC multi-processor. This method is faster than the parallel implementations of other widely used methods namely Gauss-Jordan, Gauss-Seidal or LU decomposition based inversion. This new algorithm also introduces a novel technique, called adaptive pivoting, for solving the zero pivot problem at no cost. O...
متن کاملAn Efficient VLSI Architecture Parallel Prefix Counting With Domino Logic
We propose an efficient reconfigurable parallel prefix counting network based on the recently-proposed technique of shift switching with domino logic, where the charge/discharge signals propagate along the switch chain producing semaphores results in a network that is fast and highly hardware-compact. The proposed architecture for prefix counting N 1 bits features a total delay of (4 logN +pN 2...
متن کاملParallel Prefix Scan with Compute Unified Device Architecture (cuda)
Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-bystep procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm and procee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Vietnam Journal of Computer Science
سال: 2020
ISSN: 2196-8888,2196-8896
DOI: 10.1142/s2196888821500159